home *** CD-ROM | disk | FTP | other *** search
- WB-tree File Based Associative String Data Base System.
- Copyright (c) 1991, 1992, 1993 Holland Mark Martin
-
- Permission to use, copy, modify, and distribute this software and its
- documentation for educational, research, and non-profit purposes and
- without fee is hereby granted, provided that the above copyright
- notice appear in all copies and that both that copyright notice and
- this permission notice appear in supporting documentation, and that
- the name of Holland Mark Martin not be used in advertising or
- publicity pertaining to distribution of the software without specific,
- written prior consent in each case. Permission to incorporate this
- software into commercial products can be obtained from Jonathan
- Finger, Holland Mark Martin, 174 Middlesex Turnpike, Burlington, MA,
- 01803-4467, USA. Holland Mark Martin makes no representations about
- the suitability or correctness of this software for any purpose. It
- is provided "as is" without express or implied warranty. Holland Mark
- Martin is under no obligation to provide any services, by way of
- maintenance, update, or otherwise.
-
- DBSCM
-
- DBSCM is a disk based, sorted associative array package (WB)
- integrated into the Scheme implementation SCM. These associative
- arrays consist of keys and values which are strings of length less
- than 256 bytes. Associative arrays can be used to form a MUMPS style
- database which ncan be used to implement standard record structures
- without auxiallary files (see example in example.scm).
-
- Basic operations are creation, destruction, opening and closing of
- diskfiles and arrays, insertion, deletion, retrieval, successor, and
- predecessor (with respect to dictionary order of keys). Functional
- application of find-next, deletion, and modification over a range of
- consecutive key values is supported.
-
- Multiple associative arrays can be stored in one disk file.
- Simultaneous access to multiple disk files is supported. A structure
- checker, garbage collector are included. A repair program and
- ram-disk type file (for temporary structures) are in developement.
-
- The current WB implementation has a file size limit of 2^32 * block
- size (default 2048) = 2^43 bytes (8796 Gbytes). We currently run WB
- with file sizes of several hundred Megabytes. WB does its own memory
- and disk management and maintains an in core cache of recently used
- blocks. The WB implementation adds about 45 Kbytes to the size of
- SCM.
-
- WB is implemented using a B-tree variant. B-trees give slower access
- than hashing but are dynamic and provide an efficient determination of
- successor and predecessor keys. All operations are O(log(n)) in the
- size of the database. B-trees are commonly used by database systems
- for implementing index structures. B-trees are optimized for using
- the minimum number of disk operations for large data structures.
- Prefix and suffix compression are used for storage efficiency.
-
- STATUS CODES
-
- Some of the following calls return status codes defined as follows:
-
- (define SUCCESS 0) ; successful execution
- (define NOTPRES -1) ; successful execution, no data present or no change made
- (define TERMINATED -2) ; failure, no damage, caller can retry operation
- (define RETRYERR -10) ; failure, no damage, caller can retry operation
- (define ARGERR -15) ; failure, no damage, call was in error
- (define NOROOM -20) ; failure, no damage, out of room in file
- (define TYPERR -30) ; failure, file or object was not of correct type
- (define IOERR -40) ; i/o error, DB may be damaged
- (define STRANGERR -45) ; internal error, DB may be damaged
- (define UNKERR -90) ; placeholder code
-
- SEGMENTS
-
- (init-wb max-num-ents max-num-buks max-blk-size) procedure
-
- Initializes the WB system. Max-blk-size determines the size of the
- disk cache buffers. Max-num-ents is the number of disk cache buffers
- to be allocated. (* max-num-ents max-blk-size) should be less than
- the size of RAM on your computer. If not all max-num-ents cannot be
- allocated (by malloc) then WB can still run properly. Max-num-buks is
- the number of hash buckets for the disk cache. It should be of
- approximately the same size as or larger than max-num-ents. The
- number of buffers actually allocated is returned.
-
- If open-seg or create-seg is called when init-wb is not called, then
- WB will be initialized as though init-wb was called with the
- parameters 75, 150, and 2048.
-
- (final-wb) procedure
-
- Frees all memory used by the WB system. All segments will be closed.
-
- (open-seg seg filename mode) procedure
-
- Seg should be a non-negative number. Opens the database seg in
- filename and returns seg if successful, a status code otherwise. The
- database seg will be read-only if the mode argument is 0. It will be
- read-write if the mode argument is 2.
-
- (close-seg seg hammer) procedure
-
- Closes database segment seg and the file containing it. If hammer is
- #f then if there are any problems freeing buffers then the close is
- aborted. A status code is returned.
-
- (make-seg seg filename block-size) procedure
-
- Makes a new empty database seg of name filename. A status code is
- returned. Use open-seg to get the new seg. The block-size is the
- size of B-tree blocks. Unless you are testing splitting algorithms,
- block-size should be 2048.
-
- ASSOCIATIVE ARRAYS
-
- In this group of functions, typ should be either #\D (directory) or
- #\T (regular tree). B-trees with typ #\D which are pointed to by
- special entries in the root block (1) protect all their special
- entries from garbage collection by the check program. #\T is for
- regular (data) arrays.
-
- (create-db seg typ name) procedure
-
- Returns a B-tree whose name has been entered in the root directory.
-
- (open-db seg name) procedure
-
- Returns the B-tree whose name has been entered in the root directory
- or #f if not found
-
- LOWER LEVEL OPERATIONS
-
- The write-control-bits argument (wcb) to these functions controls the
- latency of updates to the file after various operations. These bits
- are defined as follows:
-
- (1) SAP: save block after PUTs
- (2) SAR: save block after REMOVEs
- (4) SAC: force block save after cached block changes
- (8) FAC: flush buffer entirely after cached block changes
- (not currently implemented)
-
- (create-bt seg typ wcb) procedure
-
- Creates a new root block in seg seg of type typ and returns a
- bt-handle open to it. This would typically be used to create a
- temporary b-tree which should be reclaimed by check if system crashes.
-
- (open-bt seg blknum wcb) procedure
-
- Returns a bt-handle open to seg number seg, block number blknum. If
- no such block exists or is not a root block a status code is returned.
-
- (str2long string index) procedure
-
- Block numbers and some other integer quantities are stored in the
- database as four-byte pointers. Str2long converts the 4 bytes in
- string starting at index into an unsigned integer.
-
- (long2str! string index integer) procedure
-
- Stores integer into 4 bytes of string starting at index.
-
- RECORD OPERATIONS
-
- (bt:get han key) procedure
-
- Returns a string of the value associated with key in the bt which han
- is open to. Returns #f if key is not in the bt. Key is a string.
-
- (bt:next han key) procedure
-
- Returns the next key in bt han or #f if none.
-
- (bt:prev han key) procedure
-
- Returns the previous key in bt han or #f if none.
-
- (bt:put! han key val) procedure
-
- Associates key with val in the bt han. A status code is returned.
-
- (bt:rem! han key) procedure
-
- Removes key and it's associated value from bt han.
-
- SEMAPHORE OPERATIONS
-
- These 2 calls can be used for locking and synchronizing processes.
-
- (bt:put han key val) procedure
-
- Associates key with val in the bt han only if key was previously
- empty. Returns #t for success, #f for failure.
-
- (bt:rem han key) procedure
-
- Removes key and it's associated value from bt han only if key is
- present. Returns key's value for success, #f for failure (not
- present).
-
- MULTIPLE OPERATIONS
-
- (bt:rem* han key1 key2) procedure
-
- Removes keys (and their associated values) between (including) key1
- and (not including) key2 from bt han. A status code is returned.
-
- (bt:scan han op key1 key2 func blklimit) procedure
-
- Applies procedure func to a range of keys (and values) and either
- counts, modifies, or deletes those associations. A list of a status
- code, the count of records processed, and a new value for key1 is
- returned.
-
- If op is -1, indicated keys will be deleted; If op is 0, indicated
- keys will be counted; If op is 1, the value of each key in the range
- will be changed to be the string returned by func.
-
- Func is called with 2 string arguments, the key and the value. If op
- is 1 and FUNC returns #f then no change will be made. If op is 1 and
- func returns a string then that string will replace the value for that
- key. For the other (count, delete) modes, func should return #f or
- #t. If func is #t, the association will be counted (and possibly
- deleted).
-
- Keys from key1 (inclusive) up to key2 (exclusive) are scanned. If
- blklimit is -1 then the entire range is scanned; otherwise only as
- many blocks (internal database structures) as specified by blklimit
- are scanned. The scan can be restarted by using the returned
- information. For instance, the following expression counts the number
- of keys between "3" and "4" one block at a time and returns a list of
- the number of keys and the number of blocks.
-
- (do ((res (bt:scan current-bt 0 "3" "4" (lambda (k v) #t) 1)
- (bt:scan current-bt 0 (caddr res) "4" (lambda (k v) #t) 1))
- (blks 0 (+ 1 blks))
- (cnt 0 (+ cnt (cadr res))))
- ((not (= -1 (car res))) (list (+ cnt (cadr res)) (+ 1 blks))))
-
- It is good to specify a positive blklimit when dealing with large
- scans. More details on the operation of scan can be found in
- scan.scm.
-
- DIAGNOSTICS
-
- The value returned by the following routines is unspecified.
-
- (check-access)
-
- Checks that buffers and locks are in a consistent state and fixes them
- if WB routines have been interrupted.
-
- (clear-stats)
-
- Resets all the collected statistics to 0.
-
- (stats)
-
- Prints a summary of operational statistics since the last clear-stats
- or cstats.
-
- (cstats)
-
- Prints a summary of operational statistics since the last clear-stats
- or cstats, then calls clear-stats.
-
- (show-buffers)
-
- Prints a table of the status of all disk cache buffers.
-